Introduction
The aim of this assignment was to demonstrate the use of basic stastical plots in R.
The data that was chosen was Uber API data. The start latitude and longitude along with the end Latitude and Longitude was sent to the Uber API. In return UBER API responded with the following fields 1) Type of UBER’s available (Ranging from UBER X, Pool to UBER Lux) 2) The Highest cost estimate 3) The Lowest cost estimate 4) The average cost range estimate 5) The distance between the two points 6) The currency of the location 7) The language of one of the cabs (Espanol)
1)Basic Stastical Plots that were included were:
Scatterplot (Uber name vs highest cost estimate) Text (Uber name vs highest cost estimate) Bar chart (Uber name vs highest cost estimate) Line chart (Date vs Probable Cost) Area chart (Date vs Probable Cost) Dot plot (highest cost estimate) Histogram (highest cost estimate) Frequency polygon (highest cost estimate) Box plot (Uber name vs Extrapolated cost estimate (eg: 8-10 became 8,9,10)) Violin plot (Uber name vs Extrapolated cost estimate (eg: 8-10 became 8,9,10))
A faceted plot (Uber name - highest cost estimate - lowest cost estimate)
A ggmap plot (Start Location and End Location)
4)Plotly bar plot (Uber name vs highest cost estimate)
The use of ggplot2, ggmap and plotly was demonstrated in the R studios. A copy of the notebook was published at Rpubs (Link: https://rpubs.com/AnmolChawla/assignment10 ) An R project with a seperate data folder was created and incremnetal git commits were made to ensure good practice. Notebook was properly formatted with R Markdown
# Basic Plots
df <- read.csv("C:/Users/anmol/Desktop/INF 554/Homework 10/r_assignement/data/uber.csv")
mydata <- read.csv("C:/Users/anmol/Desktop/INF 554/Homework 10/r_assignement/data/uber_data1.csv")
mydata['date']<-as.Date(mydata$date)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.4
library(plotly)
## Warning: package 'plotly' was built under R version 3.4.4
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(ggmap)
## Warning: package 'ggmap' was built under R version 3.4.4
##
## Attaching package: 'ggmap'
## The following object is masked from 'package:plotly':
##
## wind
df
## ï..localized_display_name distance display_name
## 1 UberXL 1.59 UberXL
## 2 Black SUV 1.59 Black SUV
## 3 Pool 1.59 Pool
## 4 UberX 1.59 UberX
## 5 Espanol 1.59 Espanol
## 6 Select 1.59 Select
## 7 Black 1.59 Black
## 8 Assist 1.59 Assist
## 9 WAV 1.59 WAV
## 10 Lux 1.59 Lux
## product_id high_estimate low_estimate
## 1 9502f87d-e0d0-488d-b84f-b8537538c339 13 10
## 2 16ecc8ec-7fe5-4c5f-9c68-ff9c696f7d5f 30 23
## 3 e61cd68e-bf67-405e-94c1-4993017c6afe 11 7
## 4 2143f90b-ce68-4f6d-a113-4872b207e626 11 8
## 5 1c770649-4755-45d0-b1de-e8cc6e8639bb 11 8
## 6 3bbfad48-dd77-45dc-9bb0-593821f1a5dd 17 13
## 7 e4578b16-6714-4cba-a131-f8cb56ad4555 20 15
## 8 876a0e0f-e232-4131-a6e2-355db8045030 11 8
## 9 aef7503f-8ab9-4470-8ed9-63d797fa721e 11 8
## 10 dee3ccd0-1736-4397-8799-53eb21ffe92e 36 29
## duration estimate currency_code
## 1 480 $10-13 USD
## 2 480 $23-30 USD
## 3 480 $7-10 USD
## 4 480 $8-11 USD
## 5 480 $8-11 USD
## 6 480 $13-17 USD
## 7 480 $15-20 USD
## 8 480 $8-11 USD
## 9 480 $8-11 USD
## 10 480 $29-36 USD
mydata
## ï..serial name estimate date X
## 1 1 UberXL 10 2018-10-01 NA
## 2 2 UberXL 11 2018-10-02 NA
## 3 3 UberXL 12 2018-10-03 NA
## 4 4 UberXL 13 2018-10-04 NA
## 5 5 UberXL 14 2018-10-05 NA
## 6 6 Black SUV 23 2018-10-06 NA
## 7 7 Black SUV 24 2018-10-07 NA
## 8 8 Black SUV 25 2018-10-08 NA
## 9 9 Black SUV 26 2018-10-09 NA
## 10 10 Black SUV 27 2018-10-10 NA
## 11 11 Pool 7 2018-10-11 NA
## 12 12 Pool 8 2018-10-12 NA
## 13 13 Pool 9 2018-10-13 NA
## 14 14 Pool 10 2018-10-14 NA
## 15 15 Pool 11 2018-10-15 NA
## 16 16 UberX 8 2018-10-16 NA
## 17 17 UberX 9 2018-10-17 NA
## 18 18 UberX 10 2018-10-18 NA
## 19 19 UberX 11 2018-10-19 NA
## 20 20 UberX 12 2018-10-20 NA
## 21 21 Español 8 2018-10-21 NA
## 22 22 Español 9 2018-10-22 NA
## 23 23 Español 10 2018-10-23 NA
## 24 24 Español 11 2018-10-24 NA
## 25 25 Español 12 2018-10-25 NA
## 26 26 Select 13 2018-10-26 NA
## 27 27 Select 14 2018-10-27 NA
## 28 28 Select 15 2018-10-28 NA
## 29 29 Select 16 2018-10-29 NA
## 30 30 Select 17 2018-10-30 NA
## 31 31 Black 15 2018-10-31 NA
## 32 32 Black 16 2018-11-01 NA
## 33 33 Black 17 2018-11-02 NA
## 34 34 Black 18 2018-11-03 NA
## 35 35 Black 19 2018-11-04 NA
## 36 36 Assist 8 2018-11-05 NA
## 37 37 Assist 9 2018-11-06 NA
## 38 38 Assist 10 2018-11-07 NA
## 39 39 Assist 11 2018-11-08 NA
## 40 40 Assist 12 2018-11-09 NA
## 41 41 WAV 8 2018-11-10 NA
## 42 42 WAV 9 2018-11-11 NA
## 43 43 WAV 10 2018-11-12 NA
## 44 44 WAV 11 2018-11-13 NA
## 45 45 WAV 12 2018-11-14 NA
## 46 46 Lux 29 2018-11-15 NA
## 47 47 Lux 30 2018-11-16 NA
## 48 48 Lux 31 2018-11-17 NA
## 49 49 Lux 32 2018-11-18 NA
## 50 50 Lux 33 2018-11-19 NA
Basic Plots
#Scatter Plot
#Display_name = The name that get's displayed to the USer on the Uber App
#High_ estimte = The Hoghest Cost estimate that the user can be charged
ggplot(df, aes(display_name, high_estimate)) + geom_point()
#text
#Display_name = The name that get's displayed to the USer on the Uber App
#High_ estimte = The Hoghest Cost estimate that the user can be charged
#The test shows the lowest to highest cost estimate for that ride.
ggplot(df, aes(display_name, high_estimate)) + geom_text(aes(label = estimate))
#bar chart
#Display_name = The name that get's displayed to the USer on the Uber App
#High_ estimte = The Hoghest Cost estimate that the user can be charged
ggplot(df, aes(display_name,high_estimate)) + geom_bar(stat = "identity") #count of models by manufacturer (rows)
#Line Chart
#Use Case - A rider decides to travel on all options available on an uber for five days each making it fifty days worth of travel.
#date= The date of the travel
#estimte = The approximate cost that he had to pay.
ggplot(mydata, aes(date, estimate)) + geom_line()
#Area Chart
#Use Case - A rider decides to travel on all options available on an uber for five days each making it fifty days worth of travel.
#date= The date of the travel
#estimte = The approximate cost that he had to pay.
ggplot(mydata, aes(date, estimate)) + geom_area()
#Dot Plot
#high_estimte = The apprimate high costs available on the app.
#Count - How many time that High cost occurs
ggplot(df, aes(x = high_estimate)) + geom_dotplot(binwidth = 1)
#Histogram
#high_estimte = The apprimate high costs available on the app.
#Count - How many time that High cost occurs
ggplot(df, aes(x = high_estimate)) + geom_histogram(binwidth = 1)
#Frequency Polygon
#high_estimte = The apprimate high costs available on the app.
#Count - How many time that High cost occurs
ggplot(df, aes(x = high_estimate)) + geom_freqpoly(color = "blue")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
#Box Plot
#name: The name of the option displayed on the app
#Estimate: The cost associated with that option. Range from lowest to highest cost estimate.
ggplot(mydata, aes(name, estimate)) + geom_boxplot()
#Violin Plot
#name: The name of the option displayed on the app
#Estimate: The cost associated with that option. Range from lowest to highest cost estimate.
#The shape remains the same as the data is evenly divided at a step of one and also with the same frequeny.
ggplot(mydata, aes(name, estimate)) + geom_violin()
Faceted Plot
#name: The name of the option displayed on the app
#low_estimate: The lowest cost associated with that option.
#high_estimate: The highest cost associated with that option.
ggplot(df, aes(high_estimate, low_estimate , color = display_name)) +
geom_point() +
facet_grid(cols=vars(display_name))+
theme(legend.position="none")
MAP
# GGMAP
#circle: Represents the start point, denoted at the start point lat and long UCLA
#rectangle: Represents the end point, denoted at the end point lat and long USC
bb <- c(left = -125.39, bottom = 31.0, right = -113.5, top = 42.0)
stamenmap.ca <- get_stamenmap(bbox = bb, zoom = 6, maptype = "toner")
## Map from URL : http://tile.stamen.com/toner/6/9/23.png
## Map from URL : http://tile.stamen.com/toner/6/10/23.png
## Map from URL : http://tile.stamen.com/toner/6/11/23.png
## Map from URL : http://tile.stamen.com/toner/6/9/24.png
## Map from URL : http://tile.stamen.com/toner/6/10/24.png
## Map from URL : http://tile.stamen.com/toner/6/11/24.png
## Map from URL : http://tile.stamen.com/toner/6/9/25.png
## Map from URL : http://tile.stamen.com/toner/6/10/25.png
## Map from URL : http://tile.stamen.com/toner/6/11/25.png
## Map from URL : http://tile.stamen.com/toner/6/9/26.png
## Map from URL : http://tile.stamen.com/toner/6/10/26.png
## Map from URL : http://tile.stamen.com/toner/6/11/26.png
USC <- data.frame(label = "USC", lon = -120, lat = 35)
UCLA <- data.frame(label = "UCLA", lon = -118.44, lat = 34.07)
ggmap(stamenmap.ca) + geom_point(data = USC, aes(x = -120, y = 35), color="red", size=5, alpha=.5) + geom_point(data = UCLA, aes(x = -118, y = 34), shape = 17, color="red", size=5, alpha=.5)
Interactive Bar Plot
f <- list(
family = "Courier New, monospace",
size = 18,
color = "#7f7f7f"
)
x <- list(
title = "Uber Options",
titlefont = f
)
y <- list(
title = "Highest Cost Estimate",
titlefont = f
)
p <- plot_ly( x = df$display_name,y = c(df$high_estimate),name = "Uber",type = "bar") %>%
layout(xaxis = x, yaxis = y)
p
## Warning: package 'bindrcpp' was built under R version 3.4.4